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ABSTE'ACT 



Jhe exaoination of credibility effects in predicting 
achievesent, an iaportant step in the study of source credibility 
effects on attributions, substantiates Birnbaua's findings that 
variation in the credibility of infortEstion can be represented by 
changes in the weight of the inforsaticn. Undergraduate subjects 
(N«6S) predicted the perfornance of hypothetical students on a 
coaprehensive college final exam of oediun difficulty, based on IQ 
scores and study tioe efforts ratings cf varying reliability. The 
reliability cf effort and ability <IQ) information was aanipulated to 
test averaging and aultiplying ncdels for differences in prediction. 
Eeaults indicated that increased reliability for either ability or 
effort infcriatien had greatet effects on judged perfornance. 
Increased reliability of one type of inforaation lessened the effect 
of the other type of inforaation. The fi^idings are consistent with an 
averaging lodel in which reliability cf information influences its 
weight, but are inconsistent with a multiplying model. (NRB) 
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EFFECTS OF INFORMATION RELIABILITY IN PREDICTING 
■ TASK PERFORMANCE USING ABILITY AND EFFORT 

In the literature on achievement motivation and achievement attributions, a 
commonly stated hypothesis is that performance Is predicted to be a multiplicative 
function of effort and ability. For example. Holder (1958, p. 83) stat3d the hy- 
pothesis as follows: 

"The personal constituents, namel> power and trying, are related 
as a multiplicative combination, since the effective personal 
force is zero If either of them la zero. For instance. If a per- 
son has the ability, but does not try, at all, he will make no 
progress toward the goal.'* 

Heider here was discussing how people view ability and effort as determining per- 
formance (i.e., "naive psychology"). 

Support for the multiplicative combination rule for Judgments of the perfor- 
mance of a hypothetical other has been obtained by Anderson and Butzln (1974) and 
Kun, Parsons, and Ruble (197A). Anderson and Butzln, however, suggested that an 
averaging model in which the weights were allowed to vary with the scale values 
might provide a competing representation of judgments of performance based on 
ability and effort information. Neither Anderson and Butzln' s wort, nor that of 
Kun et al. were designed to test the possibility that an averaging model might al- 
so acconmiodate the results. Recent work by Singh, Gupta and Dalai (1979) con- 
cluded that in the Indian culture the combination of ability and motivation can 
be represented by an averaging model. They attributed their results to a differ- 
ence between the Indian and U.S. cultures. Results reported by Surber with U.S. 
populations (in press; Note 1) also suggested that the averaging hypothesis Is 
merited further examination. 

Previous tests of the averaging model by Singh et al. and Surber relied on 
the "set size" effect. That Is, an averaging model predicts that a piece of in- 
formation presented alone should have a larger effect than when presented in com- 
bination with other information. Unfortunately, there Is a class of additive 
models that are capable of predicting effects of the number of pieces of informa- 
tion (T. Anderson & Blrnbaum, 1976; Gollob, Rossman & Abelson, 1973). Thus, the 
set size effects of Singh et al. and Surber are not definitive. The present work 
provides a more definitive test of the averaging model by also varying the credi- 
bility of information about ability and effort. Recent work on source credibility 
by Blrnbaum (1976; Blrnbaum & Stegner, 1979; Blrnbaum, Wong & Wong, 1976) provides 
evidence that variation in the credibility of Information can be represented by ^ 
change in the weight of the information. The present experiment extends Blrnbaum s 
analysis of source credibility effects to predictions of academl". performance. 
In the present experiment, subjects Judged the performance of hypothetical stu- 
dents on a comprehensive final exam in a college course. Information about each 
hypothetical student's Intellectual ability Was given in terms of an XQ score from 
one of three different IQ tests described as varying in their reliability. Infor- 
mation about effort was given in terras of estimates of the student's study time 
for the exam. The information about study time also varied In reliability. 
Based on this information, subjects predicted the students' performance on the 
exam. 
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Although source credibility has long been a topic of interest to social 
psychologists (e.g., Cohen, 1964; McGulre, 1968), with few exceptions work on 
source credibility hos not addressed predictions of behavior or outcomes. Attri- 
bution researchers, however, have recently begun to explore a variety of factors, 
that Influence the credibility of Information such as "base rates". For example, 
sample size taken in determining the base rate (Kassln, 1979a), randomness of a 
sample (Hausen & Donoghue, 1977), and perceived causal relation of the base-rate 
information to a predicted outcome (AJzen, 1977; Tversky & Kahneman, 1977) have 
been examined. As noted by Kassln (1979b) these studies are somewhat atheoretl- 
cal, though they do provide evidence that the credibility of such variables 
can he Influenced. Blrnbaum's model of source credibility has the potential to 
provide a theoretical umbrella for such phenomena In social attribution. Exami- 
nation of credibility effects In predicting achievement can be viewed as an im- 
portant step in laying the groundwork for the study of source credibility effects 
on attributions. 

Models for combining ability and effort 

An averaging model for Judgments of performance can be written: 

■ ^IQ^IQ ^ ^ST°ST ^^0^0 (1) 
^IQ ^ST 

jhere R is the judged performance, Wjq and Wg« are weights of IQ and study time 
information that depend on the reliability of the information, Wq is the weight 
of the initial impression, and Sq, s,q and Sg^ are the scale values of the i^^^lal 
impression (i.e., expected performance in the absence of any information), the IQ 
information and the study time information, respectively. 

ITie multiplying model for judgments of performance (R - SiqSst^ raaV^B no pro- 
vision for variation in the reliability of IQ and study time 
Information. One could propose a kind of weighted multiplying model, however: 

^ " ^"IQ^IQ^ ^"sT^sr^ • 

Intuitively, this model can be conceptualized as a two-atep integration model in 
which the subject first combines the weight of each type of information with the 
value of it (e.g., weight of IQ combined with value of IQ, yields a net impression 
of the IQ information). Second, the subject c< mblnes the IQ Impression with the 
study time impression multlpllcatlvely. 

The averaging model of Equation 1 and the raultiplyinp model of Equation 2 
both predict that as the reliability of a type of information Increases, the ef- 
fect of that information on the judgment should also Increase. For exeinple, the 
more reliable the IQ Information, the greater the predicted effect of IQ. This 
can be seen in that In both equations the weight multiplies the scale value (e.g., 

^IQ'^IQ^ • 

The averaging and multiplying models differ in the predicted effect of the 
reliability of one type of lnfonn:;tion on the impact of other information. The 
avernglns model predicts that as the reliability of one type of Information in- 
creases, the net effect of the other Information decreases. For example, as the 
reliability of IQ increases, the effect of study time on the Judgment should de- 
crease. This can be seen by considering the relative weights of IQ and study 
time. The relative weight of study time, w^ = w /(wg^ + w + Wq) , will decrease 
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ae the value of w.q Increases. . In contrast, the multiplying model of Equation 2 
predicts that inctSaslns the reliability of one type of inf omatldn will increase 
the impact of the other information on the judgment. The predictions of the 
nodels differ in other ways as well. For example, Equation 2 predicts a four-way 
interaction while Equation 1 does not. Equation 2 also predicts a bilinear inter- 
action of the levels of IQ with the levels of study time where Equation 1 does not. 
In addition, the averaging model predicts that the amount of Information presented 
will influence the impact of each piece of information (the set size effect) while 
the multiplying model does not. ^ 

METHOD 

Instructions 

Subjects were told that the purpose of the experiment was to examine how 
people use information about a student's ability and effort to predict performance 
on an exam. The exam was described as a comprehensive final in a college course 
that was of medium difficulty. 

IQ information . The instructions stated that Information about a student's 
intellectual ability would be given in terms of an IQ score, and that In different 
cases the IQ score was obtained from test procedures that differed In reliability. 
The low reliability IQ test scores were described as based on a short written, 
group administered 1*Q test taking only 10 minutes. The short IQ test was described 
as open to many sources of possible error, e.g., lack of attention to the test, 
luck In guessing correct answers, etc. The inatructlons also stated that while 
the short IQ test provides some information about a student's intelligence, It Is 
the most likely to be in error. The medium reliability IQ test scores were de- 
scribed as based on an individually administered test, requiring about an hour. 
This test was described as more likely to give a good Indication of a student's 
true Intelligence because of the larger number of Items and the fact that the test 
Is Individually administered. The high reliability IQ test scores were described 
as based on three repeated administrations of the medium reliability IQ test, 
using a different form of the test each time. The instructions stated that the 
average of the three scores provided a highly reliable measure of true IQ because 
of the large variety of test Items, administration of the test on three separate 
days, etc. Thlp procedure was described aa producing an IQ score that Is "as 
close as you can get to the student's true IQ." 

Study time Information . Information about study time was given In terms of 
how much the student studied for the course compared to others. This Information 
was described as obtained by having ctudents record their amount of studying for 
various periods of time. Subjects were told to assume that all students reported 
their study time truthfully. The low reliability study time estimate was described 
as based on the amount of time the student spent studying for the course for one 
randomly selected day during the semester. This estimate was described as not a 
very reliable estimate of overall effort In the course. Factors such as exams in 
other courses or other activities may have conflicted with the student's study 
effort on that day. Similarly, a high study time for a single day may not be a 
good Indicator because the day may be atypical. The medium reliability study 
time estimate was described as based on recorded study time for a whole week 
during the semester. This procedure was described as more likely to give a reli- 
able indicator of overall study effort than the one day estimate. The high reli- 
ability -study time estimate was described as based on recoided study time for a 
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vhole month during the senebler. This pvocedure was described as the most likely 
to Blvr a reliable Estimate of the student's overall effort In the course. 

Design 

,» 

There were 144 trials generated by a 3 (Reliability of IQ) x 4 (Level of IQ) J< 
3 (Reliability of Study Time) x 4 (Level of Study Time) factorial design. The levels 
of IQ were verbally described as well below average, somewhat below average, some- 
what above average, and well above average. Th6>'4 levels of Study Time were de- 
scribed In the sane way. In addition there were 24 trials generated by a. 
3 (Reliability of IQ) x 4(Uvel of IQ) design and a 3 (Reliability of Study Time) x 
4 (Level of Study Time) design. These 168 trials were randomly ordered and printed 
In booklets. The IQ Information was printed above the study time Information on 
each trial. The experimental trials were preceded by 22 practice trials, which 
included some stimuli more extreme than those of the main design (e.g., "extremely 
above average" or "iextremely below average"). ?Sach subject worked at hla own 
pace, with most completing the experiment In approximately one hour. 

Rating Scale 

The subjects judged performance using Integers between 1 and 19, labelled 
varying from 1 « extremely below average performance, xO average, to 19 - ex- 
tremely above average performance. 

Subjects 

The subjects were 63 undergraduate students at the University of Wisconsin 
who participated for extra credit in an Introductory psychology course. There 
were 16 males and 49 females.. 



RESULTS 

Test of the averaging model 

The lef thand panel of Figure 1 presents the effects of IQ and IQ reliability 
on judged performance (average across study time and study time reliability). 
As predicted by both the multiplying and averaging model, as IQ reliability In- 
creases the effect of the level of IQ Increases. This is also true for the effect 
of study time and study time rfo.llablllty which are presented in the righthand 
panel of Figure 1 (averaged over the levels of IQ and IQ reliability). The IQ x 
IQ reliability Interaction was significant (F(6, 384) " 120.94) as was the 
Study Time x Study Time reliability Interaction (F(6, 384) - 108.59). 



Insert Figures 1 & 2 about here 



Figure 2 presents the evidence which distinguishes the averaging from the 
multiplying model. The lefthand paiel of Figure 2 presents the mean Judgments of 
performance as a function of the lerel of IQ (abscissa) with a different curve 
for each level of study time rellab Llity. It can be seen that the higher the re- 
liability of study time, the lower i.he effect of the level of IQ. Th?3 finding 
is predicted by the averaging model, but is contrary to the multiplying model. 
The Study Time reliability x IQ interaction was significant (F(6, 384) - 13.41). 
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The IQ reliability x Study Time interaction was also significant (F(6, 384) 
20.41), and also agrees with the predictions of averaging model (see the right- 
hand panel of Figure 2). The higher the IQ reliability, the lower the effect of 
Study Tine. 

Figure 3 presents the mean Judgments of exam performance for the complete 
3x4x3x4 design. The 16 points in each panel are the 4x4 combinations of 
IQ and Study Time for one level of IQ reliability combined with one level of study 
Time reliability. In each panel, IQ is on the abscissa, and there is a »ep«ate 
curve for each level of study time. The panels in the top row are the mean Judg- 
ments for the low level of IQ reliability, the middle row for medium IQ reliabil- 
ity, and the bottom row for" high. IQ reliability, The level of study time relia- 
bility increases across the panels from left to right. 

The data of Figure 3 can be seen to agree with the predictions of the aver- 
aging model. As the level of Study Time reliability increases (as one moves from 
the left panel to the right panel within each-^ow) the spread of the curves in- 
creases. This follows from the fact that the spread of the curves ^^^^J P«"«J;. 
should be related to the relative weight of study time. Similarly, the effect of 
IQ reliability can be seen by examining the change in slope within each column. 
The curves are steeper In the bottom row than in the top row. The effect of IQ 
reliability can be seen to decrease the effect of study time by noting that within 
each column of panels, the steeper the slope the smaller the spread of the curves. 
This follows from the averaging model since increasing the absolute weight of IQ 
(wto) should decrease the relative weight of Study Time ivsi/i^sT "** ^IQ ^o''* 
The^eed for the initial impression in Equation 1 can be seen 
by examining the panels in the diagonal of Figure 3. In the upper left corner, 
where the reliability of both cues is low. neither the slope "JJ* f P^^f . J J J^'^ 
great. In contrast, in the lower right panel where the reliability of both cues 
is high, both the slope of the curves and the spread of the curves * . 

This is predicted nicely by the relative weight averaging model since the effective 
weight of the initial Impression [wq/(Wjq + Wg^ + Wq)] should decrease as the 
values of either Wjq or Wg^ increase. 



Insert Figure 3 about here 



The four way interaction predicted by the multiplying model did not materi- 
alize (F(36, 2304) - 1.08). There was a significant IQ x Study Time interaction 
(F(9. 576) - 5.84), however. This interaction is due to the fact that, averaged 
over the levels of IQ reliability and Study Time reliability, the curves converge 
slightly as the level of IQ increases. This interaction differs from Anderson 
and Butzln's (1974) and Kun et al.*s (1974) results, and is inconsistent with a 
multiplying model, but is consistent with other findings (Singh et al., l^/y; 
Surber, Note 1). An averaging model can account for such an interaction if the 
veights are allowed to vary with the scale v.-^lues (see Blrnbaura & Stegner, 1579, 
for a discussion of configural versus differentially weighted averaging models). 
There was also a significant interaction of IQ reliability x IQ x Study Time 
reliability (F(12, 768) - 2.97). This interaction was small and appeared to be 
due to variations in the size of the interaction of Study Time reliability with 
IQ across levels of IQ reliability. These effects did not appear to be system- 
atic or serious enough to merit further consideration. 
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Set size effecta 



An averaging model also predicts that the relative weight of Information 
depends on the number of other pieces of Information presented. with It. Neither 
a nultiplylog nor an additive model predicts effecta of the number of sources of 
Information combined. The set sice effects predicted by the averaging model 
can be tested In the present experiment by comparing the effect of Study Time 
Information presented alone with Its effect when combined with IQ (and vice versa) . 
Figure 4 presents the mean judgments for the IQ x IQ reliability and the Study 
Time x Study Time reliability designs. According to the averaging model, the 
ordinate variation In each panel of Figure 4 should be greater than the ordinate 
variation in the corresponding panels of Figure 1 (see Blmbaum et al., 1976; 
Experiment XX). This can be shown by a comparison of the relative weights of the 
information presented alone (e.g., Wjq/(w-q + Wq)) versus In combination with 
other Information (Wjo/(wjq + Wg-, + w^). ^ Comparison of Figure 4 with Figure 1 
reveals that these predictions of the relative weight averaging model hold for 
the present experiment. 

The averaging model of Equation 1 was fit to the mean Judgments using sub- 
routine STEPXT (Chandler, 1969) to rtinlral2e the sum of squared deviations. The 
weight of the initial Impression was set to 1.0. The overall root mean squared 
error was .290 across the 168 data points. This compares well with the standard 
errors of the means, which ranged from .115 to .440. Thus, the model predicts 
the mean judgments within the range of a standard error. The estijnated weights 
of IQ for the three levels of IQ reliability were .397, .793 and 1.040 compared 
with .317, .564 and .819 for the weights of the three levels of Study Time relia- 
bility. 

D ISCUSSION 

The data of the present experiment provide evidence in favor of the averag- 
ing model as a representation of the way ability and effort Information are com- 
bined. In contrast to previous results, this conclusion does ndt depend on the 
set size effect, although the set siae results agree with the averaging predic- 
tions. Thus, it appears that Singh et al. (1979) may have been too hasty In 
concluding that the Indian and American cultures differ In how they view ability 
and effort as determining performance. 

The fact that the interaction of ability and effort In the present work 
was not the diverging pattern found by both Anderson and But z In (1974) and Kun et 
al. (1974) requires discussion. One possible reason for the difference Is that 
Anderson and Butzln described the tasks as extremely difficult. For example, the 
instructions for judging graduate school performance stated, "A disturbingly 
large number of graduate students do not last beyond the first year of study." 
In the present experiment, the difficulty of the test was purposely described as 
medium so that the results would be representative of college students* views of 
performance in college courses. s Results similar to the present ones have been 
obtained In 3 other experiments .In, which college students judged academic per- 
formance (Surber, Note 1, Note 2). "Singh et,al.'s work, which produced results 
closely resembling those of the present experiment. Included no special Instruc- 
tions pertaining to difficulty. The taSk was described as performance during the 
first year engineering curriculum, and the subjects making the judgments were 
second year engineering students. It may be reasonable to assume that the 
second-year students regarded the first year curriculum as of med:lum difficulty, 
since they are not the ones who flunked out. Based on this analysis, task 



8 



difficulty may Influence the way ability and effort are subjectively conbined to 
predict performance (cf: Kun & Welner '(J.^73)). this hypothesis has been tested 
Bore recently by Surber (Note 2). It ii possible that prediction of performance 
in high difficulty task* ie better represented by a multiplying model. In 
Surber* 8 (Kote 2) experiment • testa of the set sire effects conformed to predic- 
tions of the averaging model even when bilinear interactions were obtained. 
Thus* Anderson and Butein's alternative interpretation of their results as con- 
sistent with a differentially weiighted averaging model appears to be preferable. 

ItaplicaticnB for heuristica of judgment 

Recent ly» Ross (1977) discussed the topic of "attributional biases in pre- 
diction/' employing a variety of heuristic concepts such as representativeness, 
availability » anchoring and adjustment, concrete vs abstraict information, correla- 
tion error, regriession error, conservatism and nonconservatism. The approach 
of the present study suggests an alternative to enumerating jtidgmental heuristics 
In predicting outcomes. Most of these heuristics can be re-expressed as pre- 
dictions of algebraic models of Judgment. 

The translation of heuristic concepts into algebraic models can be best 
illustrated by B^mbaum's (1976) numerical prediction task, since it is possible 
to calculate the optimal statistical solution, allowing evaluation of so-called 
biases. As pointed out by Bimbaum (1976) an averaging model of source reliabil- 
ity effects can predict Judgments that others might describe as conservative, 
counterconservative, optimal or representative. For example. Intuitive predic- 
tions that agree with an averaging model are consistent with the notion of repre- 
sentativeness, since the Intuitive average of two sources of information seems 
representative of the information. Similarly, the Judgments based on single cues 
in Bimbaum' s (1976) study could be. called counterconservative or nonconservatlve 
since the regression weights were higher than the optimal weights in this condi- 
tion. Such overuse of .correlated cues has also been called the "regression 
error" by Ross (1977). Conservatism (Peterson & Beach, 1967) can be seen in the 
fact that Bimbaiaa found regression weights for one condition of multiple cue 
predictions to be smaller than the optimal weights. For another condition of 
multiple cue predictions Blrnbaum found regression weights that were approximately 
optimal. Some of the findings can also be Interpreted e.a an anchoring effect. 
When a high reliability souirce is combined with a low reliability source, the 
judgments were displaced toward the value of the high reliability information. 
The subject's Judgment in this case could be said to be more firmly anchored at a 
high reliability value, producing less adjustment. Happily, all the results in 
Bimbaum' 8 numerical prediction task can be predicted by an averaging model in 
which the weights of the cues depend on the reliability of the cues. Thus, the 
model can provide a unifying theoretical framework for predicting when the effects 
described by th**. various heuristics will occur. 

Since much of the analysis of Blrnbaum' s numerical prediction task applies 
analogously to the present experiment, the potential of the model for social at- 
tribution is evident. By extending models of source credibility effects to 
predictions of achievement, the present research suggests a variety of experiments 
on source credibility in attribution. The most immediate extension would be to 
examine the effects of information reliability on attributions of ability and 
effort -"n an experiment analogous to the present one. For example, the reliabil- 
ity of information about performance and study time might be manipulated while 
asking for attributions of IQ. A common assumption of attribution theories is 
that how causes are regarded as determining an effect has an Influence on 



•ttrlbutloni for the effect (Kelley, X972; Reeder & Brewer, X9795 2uckerman & Mann, 
1979). Based on this aeoumptlon (albeit, a questionable one), one wight expect 
to find effects of Information reliability on ability and effort attributions that 
parallel the present results. 

Proposing that an averaging model has the potential to describe a variety of 
source credibility effects in attribution does not mean that It necessarily will 
be successful. Application of algebraic models of judgment to the effects of var- 
iables such as concreteness-abstractness of Information, the perceived causal 
relation of Information to outcome, randomness and/or size of samples represented 
by base-rate information, etc., can serve several purposes. First, It will 
provlds a theoreticAl context for unifying a set of phenomena in attribution. 
Second » it will help to discover and define the boundaries of algebraic models of 
attribution. Third, such research will provide enriched 'empirical interpretations 
for the parameters in the models and by doing so should stimulate research into 
the cognitive processes behind the models (cf . Graesser & Anderson, 197A; Lopes & 
Ekberg, Note 3; Sloyic, Flschoff & Lichtenstein, 1977). 

Fop'tlotes 

^The multiplying model really makes no predictions about the effect of set slse 
on Judgment. By assuming that omitted information is replaced by the identity 
operator, however, the multiplying model predicts that information presented alone 
should have the sane net impact! as when combined with other information. Ordl- 
nally, the set size predictions; of the multiplying model are the same as the ordi- 
nal set size predictions of thei additive model. The additive model has not been 
elaborated because it has been compared with the averaging model in .detail else- 
where (Blrnbaum", 1976; Birnbaum, Wong & Mong, 1976). 
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.F(6,384)= 120.94 
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Figure 1. Mean judgments of exam performance as a function of IQ and IQ 
reliability (left hand panel) and Study Time and Study Time reliability 
(right hand panel). 



Figure 2. Mean judgments of exam performance as a function of IQ and Study 
Time reliability (left hand panel) jnd Study Time and IQ reliability (right 
hand panel) . Note that the order of curves in each panel of Figure 2 is 
the reverse of the order in Figure 1, as predicted by an averaging model. 
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Figure 3. Mean Judgments of exam performance as 
a function of Study Time and IQ information. Each 
row of panels represents a different IQ reliability; 
each column of panels represents a different Study 
Time reliability • In each panel, each solid curve 
ia a different level of Study Time-^ IQ levels are 
on the abscissa. 
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Figure 4. Mean judgments of exam performance for the IQ x 
IQ reliability design (left panel) and the Study Time x 
Study Time reliability design (right panel) . 




